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Abstract 

Bayesian probability theory is an inference calculus, which originates from a gen- 
eralization of inclusion on the Boolean lattice of logical assertions to a degree of 
inclusion represented by a real number. Dual to this lattice is the distributive lat- 
tice of questions constructed from the ordered set of down-sets of assertions, which 
forms the foundation of the calculus of inquiry — a generalization of information 
theory. In this paper we introduce this novel perspective on these spaces in which 
machine learning is performed and discuss the relationship between these results 
and several proposed generalizations of information theory in the literature. 
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1 Introduction 

It has been known for some time that probability theory can be derived 

as a generalization of Boolean implication to degrees of implication repre- 
sented by real numbers [11,12]. Straightforward consistency requirements dic- 
tate the form of the sum and product rules of probability, and Bayes 1 * * * 5 theo- 
rem [11,12,47,46,20,34], which forms the basis of the inferential calculus, also 
known as inductive inference. However, in machine learning applications it is 
often times more useful to rely on information theory [45] in the design of 
an algorithm. On the surface, the connection between information theory and 
probability theory seems clear — information depends on entropy and entropy 
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is a logarithmically-transformed version of probability. However, as I will de- 
scribe, there is a great deal more structure lying below this seemingly placid 
surface. 

Great insight is gained by considering a set of logical statements as a Boolean 
lattice. I will show how this lattice of logical statements gives rise to a dual 
lattice of possible questions that can be asked. The lattice of questions has a 
measure, analogous to probability, which I will demonstrate is a generalized 
entropy. This generalized entropy not only encompasses information theory, 
but also allows for new quantities and relationships, several of which already 
have been suggested in the literature. 

A problem can be solved in either the space of logical statements or in the 
space of questions. By better understanding the fundamental structures of 
these spaces, their relationships to one another, and their associated calculi 
we can expect to be able to use them more effectively to perform automated 
inference and inquiry. 

In §2, we provide an overview of order theory, specifically partially-ordered 
sets and lattices. I will introduce the notion of extending inclusion on a finite 
lattice to degrees of inclusion effectively extending the algebra to a calculus, 
the rules of which are derived in the appendix. These ideas are used to re- 
cast the Boolean algebra of logical statements and to derive the rules of the 
inferential calculus (probability theory) in §3. I will focus on finite spaces of 
statements rather than continuous spaces. In §4, I will use order theory to 
generate the lattice of questions from the lattice of logical statements. I will 
discuss how consistency requirements lead to a generalized entropy and the 
inquiry calculus, which encompasses information theory. In §5 I discuss the 
use of these calculi and their relationships to several proposed generalizations 
of information theory. 


2 Partially-Ordered Sets and Lattices 

2 . 1 Order Theory and Posets 


In this section, I introduce some basic concepts of order theory that are nec- 
essary in this development to understand the spaces of logical statements and 
questions. Order theory works to capture the notion of ordering elements of 
a set. The central idea is that one associates a set with a binary ordering re- 
lation to form what is called a partially- ordered set , or a poset for short. The 
ordering relation, generically written <, satisfies reflexivity, antisymmetry, and 
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Fig. 1. Diagrams of posets described in the text. A. Natural numbers ordered by 
‘less than or equal to’, B. II3 the lattice of partitions of three elements ordered by 
‘is contained by’, C. 2 3 the lattice of all subsets of three elements {a, b , c} ordered 
by ‘is a subset of’. 


transitivity, so that for elements a, b ) and c we have 


Pi. For all u, a <T cl ( ppfl.pxivitiA 

P2. If a < b and b < a, then a = b ( Antisymmetry ) 

P3. If a < 6 and 6 < c, then a < c ( Transitivity ) 

The ordering a < 6 is generally read ‘6 includes a’. In cases where a < ft and 

a / 6, we write a < ft. If it is true that a < ft, but there does not exist an 

element x in the set such that a < x < ft, then we write a -< 6, read '6 covers 
a’, indicating that 6 is a direct successor to a in the hierarchy induced by the 
ordering relation. 

This concept of covering can be used to construct diagrams of posets. If an 
element b includes an element a then it is drawn higher in the diagram. If b 
covers a then they are connected by a line. These poset diagrams (or Hasse 
diagrams) are useful in visualizing the order induced on a set by an ordering 
relation. Figure 1 shows three posets. The first is the natural numbers ordered 
by the usual ‘is less than or equal to’. The second is II3 the lattice of partitions 
of three elements. A partition y includes a partition x, x < y, when every cell of 
x is contained in a cell of y. The third poset, denoted 2 3 , is the powerset of the 
set of three elements V{{a, fe, c}), ordered by set inclusion C. The orderings in 
Figures lb and c are called partial orders since some elements are incomparable 
with respect to the ordering relation. For example, since it is neither true that 
{a} < {6} or that {6} < {a}, the elements {a} and { b } are incomparable, 
written {a}||{6}- In contrast, the ordering in Figure la is a total order , since 
all pairs of elements are comparable with respect to the ordering relation. 

A poset P possesses a greatest element if there exists an element TeP, called 
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the top , where x < T for all x € P. Dually, the least element ieP, called 
the bottom , exists when JL < a: for all x E P. For example, the top of n 3 is the 
partition 123 where all elements are in the same cell. The bottom of II 3 is the 
partition 1|2|3 where each element is in its own cell. The elements that cover 
the bottom are called atoms . For example, in 2 3 the atoms are the singleton 
sets {a}, {&}, and {c}. 

Given a pair of elements x and y, their upper bound is defined as the set of 
all z E P such that x < z and y < z. In the event that a unique least upper 
bound exists, it is called the join , written xVy. Dually, we can define the 
lower bound and the greatest lower bound , which if it exists, is called the meet , 
xAy. Graphically the join of two elements can be found by following the lines 
upward until they first converge on a single element. The meet can be found 
by following the lines downward. In the lattice of subsets of the powerset 2 3 , 
the join V, corresponds to the set union U, and the meet A corresponds to the 
set intersection Pi. Elements that cannot be expressed as a join of two elements 
are called join-irreducible elements. In the lattice 2 3 , these elements are the 
atoms. 

Last, the dual of a poset P, written P d can be formed by reversing the ordering 
relation, which can be visualized by flipping the poset diagram upside-down. 
This action exchanges joins and meets and is the reason that their relations 
come in pairs, as we will see below. There are different notions of duality and 
the notion after which this paper is titled will be discussed later. 


2.2 Lattices 

A lattice L is a poset where the join and meet exist for every pair of elements. 
We can view the lattice as a set of objects ordered by an ordering relation, 
with the join V and meet A describing the hierarchical structure of the lattice. 
This is a structural viewpoint. However, we can also view the lattice from an 
operational viewpoint as an algebra on the space of elements. The algebra 
is defined by the operations V and A along with any other relations induced 
by the structure of the lattice. Dually, the operations of the algebra uniquely 
determine the ordering relation, and hence the lattice structure. Viewed as 
operations, the join and meet obey the following properties for all x, y, z E L 


L 1 . 

xV x = x, x Ax — x 

(. Idempotency ) 

L2. 

x V y = y V x, x Ay — y Ax 

( Commutativity) 

L3. 

iV(j/Vz) = (iVt/)Vz, x A (y A z) = (x A y) A z 

( Associativity ) 

LA. 

x V (x A y) — x A (x V y) = x 

(Absorption) 
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The fact that lattices are algebras can be seen by considering the consistency 
relations, which express the relationship between the ordering relation and the 
join and meet operations. 


x A y — x 

x < y <$=>• 

xVy = y 


( Consistency Relations ) 


Lattices that obey the distributivity relation 

Dl. x A (y V z) — (x A y) V (x A z) ( Distributivity of A over V) 
and its dual 

D2. x\f (y A z) = (x V y) A (xV z) ( Distributivity of V over A) 

are called distributive lattices. All distributive lattices can be expressed in 
terms of elements consisting of sets ordered by set inclusion. 

A lattice is complemented if for every element x in the lattice, there exists a 
unique element ~ x such that 

Cl. xV ~ x = T 

( Complementation) 

C2. xA ~ x — _L 

Note that the lattice 2 3 (Fig. lc) is complemented, whereas the lattice n 3 
(Fig. lb) is not. 


2.3 Inclusion and the Incidence Algebra 


Inclusion on a poset can be quantified by a function called the zeta function 

1 if x < y 

(zeta function ) (1) 

0 if x jt y 

which describes whether the element y includes the element x. This function 
belongs to a class of real- valued functions /(x, y) of two variables defined on a 
poset, which are non-zero only when x < y. This set of functions comprises the 
incidence algebra of the poset [42]. The sum of two functions /(x, y ) +g(x, y) 
in the incidence algebra is defined the usual way by 

h{x,y) = f(x,y)+g(x,y), (2) 


C(x,y) = { 


5 



as is multiplication by a scalar h(x,y) = A f(x,y). However, the product of 
two functions is found by taking the convolution over the interval of elements 
in the poset 

Hx,y)= X) f(x,z)g(z,y). (3) 

x<z<y 

To invert functions in the incidence algebra, one must rely on the Mobius 
function g,(x,y), which is the inverse of the zeta function [44,42,3] 

C (x,z)fj.(z,y) = 5(x,y), (4) 

x<z<y 

where 6(x,y) is the Kronecker delta function. These functions are the general- 
ized analogues of the familiar Riemann zeta function and the Mobius function 
in number theory, where the poset is the set of natural numbers ordered by 
‘divides’. We will see that they play an important role both in inferential rea- 
soning as an extension of inclusion on the Boolean lattice of logical statements, 
and in the quantification of inquiry. 


2-4 Degrees of Inclusion 


It is useful to generalize this notion of inclusion on a poset. I first introduce 
the dual of the zeta function, £ d (x, y), which quantifies whether x includes y, 
that is 

f l if x y 

~ " ( dual of the zeta function ) (5) 

0 if x ^ y 

Note that the dual of the zeta function on a poset P is equivalent to the zeta 
function defined on its dual P d ) since the ordering relation is simply reversed. 
I will generalize inclusion by introducing the function z(x,y), 1 


1 if x > y 


z{x,y) = < 


0 if x A y = 1 


( degrees of inclusion) (6) 


2 otherwise, where 0 < z < 1. 


where inclusion on the poset is generalized to degrees of inclusion represented 
by real numbers. 2 This new function quantifies the degree to which x includes 
y. This generalization is asymmetric in the sense that the condition where 
( d (x,y) = 1 is preserved, whereas the condition where ( d (x,y) = 0 has been 
modified. The motivation here is that, if we are certain that x includes y then 
we want to indicate this knowledge. However, if we know that x does not 

1 I have dropped the d symbol since the definition is clear. 

2 This function need not be normalized to unity, as we will see later. 
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include y, then we can quantify the degree to which x includes y . In this sense, 
the algebra is extended to a calculus. Later, I will demonstrate the utility of 
such a generalization. 

The values of the function z must be consistent with the poset structure. In 
the case of a lattice, when the arguments are transformed using the algebraic 
manipulations of the lattice, the corresponding values of z must be consistent 
with these transformations. By enforcing this consistency, we can derive the 
rules by which the degrees of inclusion are to be manipulated. This method 
of requiring consistency with the algebraic structure was first used by Cox 
to prove that the sum and product rules of probability theory are the only 
rules consistent with the underlying Boolean algebra [11,12]- The rules for the 
distributive lattices I will describe below are derived in the appendix, and the 
general methodology is discussed in greater detail elsewhere [34]. 

Consider a distributive lattice D and elements x, y, t € V. Given the degree 
to which x includes t, z(x, t), and the degree to which y includes £, z(y,t), we 
would like to be able to determine the degree to which the join x V y includes 
t, z(x V y, t). In the appendix, I show that consistency with associativity of 
the join requires that 

z(x Vy,t) = z(x, t ) + z(y, t ) - z(x A y, t ). (7) 

For a join of multiple elements x\, z 2 , • • • , x n , this degree is found by 
z(x i Vi 2 V • • • V x n ,t) — 

Y^z(xi,t)-J2 z( x i Ax j,t)+ z(xi A Xj A Xk,t) , (8) 

i i<j i<j<k 

which I will call the sum rule for distributive lattices. This sum rule exhibits 
Gian-Carlo Rota's inclusion- exclusion principle , where terms are added and 
subtracted to avoid double-counting of the join-irreducible elements in the 
join [28,42,3]. The inclusion-exclusion principle is a consequence of the Mobius 
function for distributive lattices, which leads to an alternating sum and differ- 
ence as one sums down the interval in the lattice. This demonstrates that the 
form of the sum rule is inextricably tied to the underlying lattice structure 

[34]. 

For the meet of two elements x Ay, it is clear that we can use (7) to obtain 

z(x A y,t) = z(x, t ) + z(y, t) - z(x V y, t). (9) 

However, another useful form can be obtained by requiring consistency with 
distributivity. In the appendix, I show that this consistency constraint leads 
to 

z(x /\y,t) = Cz(x,t)z(y,x At), (10) 
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which is the product rule for distributive lattices . The constant C acts as a 
normalization factor, and is necessary when these degrees are normalized to 
values other than unity. 


Last, requiring consistency with commutativity of the meet leads to the analog 
of Bayes 5 Theorem for distributive lattices 


z(y,xAt) 


z(y,t)z{x,y At) 

z(x,t) 


( 11 ) 


One does not think typically of Bayes 5 Theorem outside of the context of prob- 
ability theory, however, it is a general rule that is applicable to all distributive 
lattices. As I will demonstrate, it can be used in computing probabilities among 
logical assertions, as well as in working with questions. 


2.5 Measures and Valuations 


The fact that one can define functions that take lattice elements to real num- 
bers was utilized by Rota, wdio used this to develop and promote the field of 
geometric probability [43,28]. The more familiar term measure typically refers 
to a function // defined on a Boolean lattice 3, which takes elements of a 
Boolean lattice to a real number. For example, given x e S, (i : x M. The 
term valuation is a more general term that takes a lattice element x G & to 
a real number, v : x i— > R, or more generally to an element of a commutative 
ring with identity [44], v : x i— > A. The function introduced above (6), is 
a bi-valuation since it takes two lattice elements x,y € £ as its arguments, 
z : x, y E. When applied to a Boolean lattice, the function z is also a 
measure. 


3 Logical Statements 


George Boole [8,9] was the first to understand the algebra of logical state- 
ments, which I will interchangeably call logical assertions. Boole’s algebra is 
so familiar, that I will spend little effort in describing it. In this algebra, there 
are two binary relations called conjunction (AND) and disjunction (OR), and 
one unary operation called complementation (NOT). The binary operations 
are commutative, associative, and distributive. 

Let us now adopt a different perspective, where we view this Boolean structure 
as a set of logical statements ordered by logical implication. A statement x 
includes a statement y 7 y < x, when y implies x 7 written y — » x. Thus the 
ordering relation < is represented by — >. Logical implication as an ordering 
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Fig. 2. The lattice of assertions A3 generated by three atomic assertions a, k, and 
n. The ideals of this lattice form the lattice of ideal questions J3 ordered by set 
inclusion C, which is isomorphic to A3. The the maximum of the set of statements 
in any ideal maps back to the assertion lattice. The statements comprising three 
ideals are highlighted on the right. 

relation among a set of logical assertions sets up a partial order on the set and 
forms a Boolean lattice . The join and meet are identified with the disjunction 
and conjunction, respectively. In this case the order- theoretic notation for the 
join and the meet conveniently matches the logical notation for the disjunction 
and conjunction. However, one must remember that the join and meet describe 
different operations in different lattices. A Boolean lattice follows L1-L4, Dl, 
D2, Cl and C2, which is neatly summarized by saying that it is a complemented 
distributive lattice . 

To better picture this, consider a simple example [33] concerning the matter 
of ‘Who stole the tarts made by the Queen of Hearts?’ For the sake of this 
example, let us say that there are three mutually exclusive statements, one of 
which answers this question: 

a = ‘ Alice stole the tarts V 
k = ' The Knave of Hearts stole the tarts V 
n = { No one stole the tarts!’ 

The lattice A3 generated by these assertions is shown in Figure 2. The bottom 
element of the lattice is called the logical absurdity . It is always false, and as 
such, it implies every other statement in the lattice. The three statements a, 
fe, and n are the atoms which cover the bottom. All other logical statements in 
this space can be generated from joins of these three statements. For example, 
the statement aVk is the statement 4 Either Alice or the Knave stole the tarts!’ 
The top element T = aV&Vn is called the truism , since it is trivially true that 
4 Either Alice , the Knave , or nobody stole the tarts!’. The truism is implied by 
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every other statement in the lattice. Since the lattice is Boolean, each element 
in the lattice has a unique complement (Cl, C2). The statement a = Alice 
stole the tarts!' has as its complement the statement ^ a = k\'n = c Either the 
Knave or no one stole the tarts P This statement ~ a is equivalent to 'Alice 
did not steal the tarts V Last, note that this lattice (Fig. 2) is isomorphic to 
the lattice of powersets 2 3 * * * (Fig. lc), therefore Boolean algebra describes the 
operations on a powerset as well as implication on a set of logical statements. 


3.1 The Origin of Probability Theory 


Deductive reasoning describes the act of using the Boolean lattice structure to 
determine whether one logical statement implies another given partial knowl- 
edge of relations among a set of logical statements. From the perspective of 
posets, this equates to determining whether one element of a poset includes 
another element given some partial knowledge of inclusion among a set of poset 
elements. Since inclusion on a poset is encoded by the zeta function ( and its 
dual £ d (5), either of these functions can be used to quantify implication on 
A and perform deductive reasoning. 

Inductive reasoning or inference is different from deductive reasoning in the 
sense that it incorporates a notion of uncertainty not found in the Boolean 
lattice structure. Just as ( 9 quantifies deductive reasoning, its generalization 
z quantifies inductive reasoning. Probability 3 is simply this function z (6) 
defined on the Boolean lattice A , 

p(x\y) = z{x,y), (12) 

so that implication on the lattice is generalized to degrees of implication rep- 
resented by real numbers 


p{y\x) 


1 if x — ► y 

< 0 if xAy = l ( probability ) 

p otherwise, where 0 < p < 1 . 


(13) 


To make this more concrete, consider the example in Figure 2. Clearly, T > a, 
which is equivalent to a — » T, so that £^(T,a) = 1 and p(T|a) = 1. Now, 
T a and a A T = a, therefore ( a (a, T) = 0 and p(a\ T) — p where 0 < p < 1. 
While the truism, T =‘ Either Alice or the Knave or no one stole the tarts!\ 

3 I could call this degree of implication plausibility or perhaps by a new term, 

however we will see that this quantity follows all of the rules of probability theory. 

Since there is either an operational or a mathematical difference between this degree 

of implication and probability, I see no need to indicate a difference semantically. 
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does not imply that a = Alice stole the tarts P , the degree to which the premise 
implies that Alice stole the tarts, p(a|T), is a very useful quantity. This is the 
essence of inductive reasoning. 

Since probability is the function z defined on the Boolean lattice A , the rules 
by which probabilities may be manipulated derive directly from requiring that 
probability be consistent with the underlying Boolean algebra. Since Boolean 
algebras are distributive, we have already shown that there are three rules for 
manipulating probabilities: the sum rule of probability 

p(x V y\t) = p(x\t) + p(y\t) - p{x A y\t), 

which is equivalent to (7), the product rule of probability 

p(x A y\t ) = p(x\t)p(y\x A t ), 

which is equivalent to (10) with C = 1 , and Bayes’ theorem 

which is equivalent to (11). These three rules constitute the inferential calculus, 
which is a generalization of the Boolean algebra of logical assertions. There 
are several very important points to be made here. Probabilities are functions 
of pairs of logical statements and quantify the degree to which one logical 
statement implies another. For this reason, they are necessarily conditional. 
Since the rules by which probabilities are to be manipulated derive directly 
from consistency with the underlying Boolean algebra, probability theory is 
literally an extension of logic, as argued so effectively by E.T. Jaynes [25]. 


(14) 

(15) 

(16) 


3.2 Join- Irreducible Statements and Prior Probabilities 


The join-irreducible elements of the Boolean lattice of logical statements are 
the atoms that cover the absurdity _L. This set of atomic statements {a, } 
comprises the exhaustive set of mutually exclusive assertions that form the 
basis of this lattice. All other statements in the lattice can be found by taking 
joins of these atoms. Given assignments of the prior probabilities for the set 
of { a, } , (eg. p(a,|T)), the prior probabilities for all other statements in the 
lattice can be computed using the sum rule of the inferential calculus. This 
was proven by Gian-Carlo Rota [44, Theorem 1, Corollary 2, p.35] who showed 
that: 

Theorem 1 (Rota, Assigning Valuations [44]) A valuation in a finite dis- 
tributive lattice is uniquely determined by the values it takes on the set of 
join-irreducibles of U, and these values can be arbitrarily assigned. 
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Furthermore, given the prior probability of the degree to which T implies an 
assertion x, the degree to which any other element of the lattice implies x can 
be found via the product rule 

p(x\y) -- p(x\y A T) = . (17) 

Thus by assigning the prior probability that the truism implies each of the 
atoms, all the other probabilities can be uniquely determined using the infer- 
ential calculus. 

What is even more remarkable here, is that Rota proved that the values of 
the prior probabilities can be arbitrarily assigned . This means that there are 
no constraints imposed by the lattice structure, or equivalently the Boolean 
algebra, on the values of the prior probabilities. Thus the inferential calculus 
tells us nothing about assigning prior probabilities. Objective assignments 
can only be made by relying on additional consistency principles, such as 
symmetry, constraints, and consistency with other aspects of the problem at 
hand. Examples of useful principles are Jaynes’ Principle of Maximum Entropy 
[23] and his Principle of Group Invariance [22], which is a generalization of 
the Principle of Indifference [6,35,27]. Once these assignments are made, the 
inferential calculus, induced by consistency with order-theoretic principles, 
dictates the remaining probabilities. 


3.3 Remarks on Lattice Products 


Two spaces of logical statements can be combined by taking the lattice prod- 
uct, which can be written as the Cartesian product of the lattice elements. By 
equating the bottom elements of the two spaces, we get a distributive lattice. 
Such products of lattices are very important in inference, since it is exactly 
what one does when one takes a lattice of hypotheses and combines them with 
data. The product rule and Bayes’ theorem are extremely useful in these sit- 
uations where the prior probabilities are assigned on the two lattices forming 
the product. These issues are discussed in more detail elsewhere [34]. 


4 Questions 


In his last published work exploring the relationships between inference and 
inquiry [13], Cox defined a question as the set of all logical statements that 
answer it. At first glance, this definition is strikingly simple. However, with 
further thought one sees that it captures the essence of a question and does 
so in a form that is accessible to mathematical investigation. 
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In the previous section on logical statements, the modern viewpoint of lattice 
theory may seem like overkill. Its heavy mathematics are not necessary to 
reach the same conclusions that one reaches by simply working with Boole’s 
algebra. In addition, while some of new insight is gained, there is little there 
to change how one uses probability theory to solve inference problems. Here 
however, we will find lattice theory to be of great advantage by enabling us to 
visualize relations among sets of assertions that comprise the sets of answers 
to questions. 


4-1 Down-sets and Ideals 


If a logical statement x answers a question, then any statement y such that 
y — * x, or equivalently in order-theoretic notation, y < x, answers the same 
question. Thus a question is not defined by just any set of logical statements, 
it is defined by a set that is closed when going down the assertion lattice. Such 
a set is called a down-set [14] 


I "I ( I ^ Att m cn'f ^ 


A rl nmvn ocrb 


j t> sj oat h co-f- T n-f n>n nrA otdA coi T. 

OkJ \ju OU(l/OOI/ U UIIU u l U/VI VsUV uvv } 


written J — j L , where if a £ J, x £ L, x < a then x € J. 


Let us begin exploring questions by considering the down-set formed from a 
set containing a single element {x}, which we write 4 as X — l{x} = [x. Given 
any logical statement x in the Boolean lattice of assertions, we can consider 
the down-set formed from that assertion x 


X= lx = {y\y^xVx,yeA} (18) 

Such a down-set is called an ideal [7,14], and to emphasize this I have called 
these questions ideal questions to denote the fact that they are ideals of the 
assertion lattice A [32]. 

We are now in a position to compare questions. Two questions are equivalent 
if they ask the same thing — or equivalently when they are answered by the 
same set of assertions. The questions l Is it raining V and 6 Is it not raining 
are both answered by either the statement ‘ It is raining P or the statement 
‘ It is not raining P and all the statements that imply them. Thus our two 
questions ask the same thing and are therefore equivalent. Furthermore, if one 
question X is a subset of another question 7 in a space of questions Q, then 
answering the question X will necessarily answer the question Y. This means 
that we can use the binary ordering relation 'is a subset of’ to implement the 
ordering relation 'answers’ and therefore order the set of questions. 

4 Note that I am using lowercase letters to represent logical statements, uppercase 
letters to represent questions, and script letters to represent an entire lattice. 
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The set of ideal questions 3 ordered by set inclusion forms a lattice (Fig. 
2) isomorphic to the original assertion lattice [7]. Thus there is a one-to-one 
onto mapping from each statement x G A to its ideal question X 6 Q. The 
atomic assertions map to atomic questions, each of which has only two possible 
answers. For example, the statement k from our previous example maps to 

K=ik = {k,±}, (19) 

which is answered by either ‘The knave stole the tarts P or the absurdity _L. 
Robert Fry calls these atomic questions elementary questions [17], since you 
basically receive either exactly what you asked or no useful answer. The non- 
atomic statements map to more complex questions, such as 

KN = V n = {& V n, A;, n, J_}, (20) 

where the symbol KN is considered to be a single symbol representing a 
single question formed from the down-set of the join of the statements k and 
n. Similarly, I will use AKN to represent the question AKN = jaV&Vn. The 
lattice of ideal questions 3 can be mapped back to the lattice of assertions A 
by selecting the maximum element in the set. 


4-2 The Lattice of Questions 


We can construct more complex questions by considering down-sets, which 
are set unions of the ideals of the assertion lattice. For example, the question 
T =Who stole the tarts V is formed from the union of the three elementary 
questions 

T = A U K U N. (21) 

Since 

A=ja = {a,±} (22) 

K= lk = {k,l} 

N = in — {n, JL} 

the question T — A U K U N can be written as 

T — {a, k , n, -L}. (23) 

In this way, the question T is defined by its set of possible answers, including 
the absurdity. We could also ask the binary question B —‘Did or did not Alice 
steal the tarts?' . This question can be written as the down-set formed from 
a =‘ Alice stole the tarts!', and its complement ~ a =‘ Either the Knave or no 
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Fig. 3. The ordered set of down-sets of the lattice of assertions A results in the lattice 
of all possible questions Q = 0(A) ordered by C. The join-irreducible elements of Q 
are the ideal questions, which are isomorphic to the lattice of assertions A 3(Q). 
The down-sets corresponding to several questions, including the questions T and B 
discussed in the text, are illustrated on the right. 


one stole the tarts!' 


B = la U a (24) 

= la U Ik V n 
= {a, ±} U{kV n, k, n, _L} 

(k V n, a, k, n, _L}, 

where any one of the statements in the set will answer the question B. We 
can write B compactly as B = A U KN. 

This construction produces every possible question given the lattice of asser- 
tions A, see Figure 3. Since the questions are sets, the set of questions ordered 
by set inclusion forms a poset ordered by C, which is a distributive lattice 
[7,14]. More specifically, this construction results in the ordered set of down- 
sets of A, which is written 0(A). Thus the Boolean lattice 2 N is mapped to 
the free distributive lattice FD(N). Even though FD(N) is a lattice of sets 
and is distributive, it is not complemented [32] . Thus questions in general have 
no complements. 

The question lattice Q is closed under set union and set intersection, which 
correspond to the join and the meet, respectively. Therefore, T = AUK UN = 
AW KV N. Unfortunately, the terminology introduced by Cox is at odds with 
the order-theoretic terminology, since the joint question is formed from the 
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meet of two questions, and it asks what the two questions ask jointly, whereas 
the join of two questions, the common question , asks what the two questions 
ask in common. 

Consider the two questions T and B. Since TCB, the question T necessarily 
answers the question B . Thus asking ‘Who stole the tarts V will resolve ‘ Did 
or did not Alice steal the tarts V The converse is not true, since if one asks 
c Did or did not Alice steal the tarts?\ the reply could be ‘Either the Knave 
or no one stole the tarts /’, which still does not answer c Who stole the tarts V 
Thus the question T lies below the question B in the lattice Q 3 indicating that 
T answers B. 

The consistency relations (discussed in §2.2) can be used to better visual- 
ize these relationships. Consider again the two questions T =‘ Who stole the 
tarts V and B =‘Did or did not Alice steal the tarts?’. The join of these two 
questions TV B asks what they ask in common, which is ‘Did or did not Alice 
steal the tarts?’. Whereas, their meet TAB asks what they ask jointly, which 
is ‘Who stole the tarts ? So we have that 

TV B = B 
TAB = T 

which, by the consistency relations, implies that 

TCB. 

This can also be determined by taking the set union for the join and the set 
intersection for the meet, and working with expressions for the sets defining 
T and B. 


4-3 The Central Issue 


Just as statements can be true or false, questions can be real or vain. Cox 
defined a real question as a question that is answered by at least one true 
statement, whereas a vain question is a question that is answered by no true 
statement [13]. In Lewis Carroll’s Alice’s Adventures in Wonderland , it turned 
out that no one stole the tarts. Thus, any question not allowing for that 
possibility is a vain question — there does not exist a true answer that will 
resolve the issue. 

When the truth values of the statements are not known, a question Q is only 
assured to be a real question when it is answered by every one of the atomic 
statements of A, or equivalently when Q AQi 7^ J_ for all elementary questions 
Qi £ Q. Put simply, all possibilities must be accounted for. Previously, I called 
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these questions assuredly real questions [32] , which I will shorten here to real 
questions. The set of real questions is a sublattice 31 of the lattice Q. That is, 
it is closed under joins and meets. 

The bottom of IR is the smallest real question, and it answers all other questions 
in 3?. It is formed from the join of all of the elementary questions, and as such 
it does not accept an ambiguous answer. For this reason, I call it the central 
issue. In our example, the central issue is the question T =‘ Who stole the 
tarts?'. Resolving the central issue will answer all the other real questions. 
Recall that by answering T =Who stole the tarts?', we necessarily will have 
answered B =‘Did or did not Alice steal the tarts?' As one ascends the real 
sublattice, the questions become more and more ambiguous. For example, the 
question AN U KN will narrow down the inquiry, resolving whether it was 
Alice or the Knave, but not necessarily ruling out that no one stole the tarts. 


4-4 Duality between the Assertions and Questions 


The question lattice Q is formed by taking the ordered set of down-sets of the 
assertion lattice, which can be represented by the map A ► 0(A), so that 
Q = 0(A). The join-irreducible questions 0(Q) are the ideal questions, which 
by themselves form a lattice that is isomorphic to the assertion lattice A, which 
can be represented by the map Q >—>■ J(Q). Thus we have two isomorphisms Q ~ 
0(A) and A ~ #(Q). This correspondence, called Birkhoff’s Representation 
Theorem [14], holds for all finite ordered sets A. The lattice Q is called the 
dual of 0(Q), and the lattice A is called the dual of 0(A). This is of course 
a different notion of duality than I introduced ear her. What is surprising is 
that the join-irreducible map takes lattice products to sums of lattices, so that 
the map 3 acts like a logarithm, whereas the map 0 acts like the exponential 
function [14]. 


4-5 The Geometry of Questions 


There are some interesting relationships between the lattice of questions and 
geometric constructs based on simplexes. A simplex is the simplest possible 
polytope in a space of given dimension. In zero dimensions, a 0-simplex is 
a point. A 1-simplex is a line segment, which consists of two O-simplexes 
connected by a line. A 2-simplex is a triangle consisting of three O-simplexes 
joined by three 1-simplexes, in conjunction with the filled in interior. The 
3-simplex is a tetrahedron. Finally the n-simplex is an n-hypertetrahedron. 

Since, an n — 1 simplex can be used to construct an n-simplex, we can order 
these simplexes with an ordering relation ‘contains’. For example, if two 0- 
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Fig. 4. The lattice of questions (A) is isomorphic to the lattice of simplicial complexes 
(B). The atomic questions A, K, and N are isomorphic to the three O-simplexes. 
The real questions are isomorphic to the simplicial complexes that include every 
0-simplex in the space. Questions are not only related to these geometric constructs, 
they are also isomorphic to hypergraphs. Since low-order hypergraphs look like 
low-order simplicial complexes, the lattice of hypergraphs with three generators is 
almost identical. The only exception is that instead of the 2-simplex at the top, we 
have the hypergraph connecting the three points (see top of B) . 

simplexes, A and B , are used to create a 1-simplex AB, we write A < AB 
and B < AB. We can also define a join of a m-simplex with an n-simplex as 
a geometric object akin to the set union of the two simplexes. Such an object 
is called a simplicial complex. The set of all simplicial complexes formed from 
N distinct O-simplexes forms the free distributive lattice FD(N ) [28]. We 
can identify each n-simplex with an ideal question in the question lattice 
formed by taking the down-set of the join of n assertions. This allows us to 
set up a one-to-one correspondence between the set of questions and the set of 
simplicial complexes. The lattice of questions is thus isomorphic to the lattice 
of simplicial complexes (Figure 4). 

Another interesting isomorphism can be established. Hypergraphs are graphs 
with generalized edges that can connect more than a single node. By identi- 
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fying each n-simplex with an n-hypergraph, a one-to-one correspondence can 
be made between simplicial complexes and hypergraphs. Thus, a lattice of 
hypergraphs can be constructed, w r hich is isomorphic to both the lattice of 
simplicial complexes and the lattice of questions. The relationship between 
hypergraphs and information theory was noted by Tony Bell [4]. I will show 
that the lattice on which Bell’s co-information is a valuation is precisely the 
question lattice [32]. 


4-6 The Inquiry Calculus 


The algebra of questions provides us with the operations with which questions 
can be manipulated. Given two questions, we can form the common question 
and the joint question using the join and meet respectively. Inclusion on the 
lattice Q indicates whether one question is answered by another. We now 
extend this algebra to a calculus by generalizing inclusion on this lattice to 
degrees of inclusion represented by real numbers just as we did on the Boolean 
lattice. Consider two questions, one of which I will call an outstanding issue 
/, and the other an inquiry Q. The degree to which the inquiry Q answers, or 
resolves, the issue I is a measure of the relevance of the inquiry to the issue. 
This is expressed mathematically by defining, 

b(I\Q) = z(I,Q), (25) 


which is explicitly written as 


KW) 


' 1 if I > Q 
\ 0 if / AQ = 1 


( relevance ) 


b otherwise, where 0 < b < 1. 


(26) 


In this lattice, these numbers indicate the degree to which the question Q 
answers (or resolves) the question (or issue) I. If the degree is low, then the 
inquiry has little relevance to the issue. If it is zero, the inquiry does not 
resolve the issue, and thus is not relevant. For this reason, I call this degree 
the relevance 5 , which I will write as b(I\Q) — z(I , Q ). This can be read as the 
degree to which I is answered by Q , which is the same as the degree to which 
I includes Q on the lattice. In practice one would most likely work with real 
questions and compute quantities like b(T\B) J which is the degree to which 


5 This degree was called ‘bearing’ by Cox, and Robert Fry adopted the symbol 6, 
which is an upside-down p to reflect the relationship with probability. Ariel Caticha 
suggested the name ‘relevance’ since its Latin origin would make it more accessible 
to speakers whose native language was not English. 
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asking ‘Did or did not Alice steal the tarts?’ resolves 1 Who stole the tarts?’. 
This quantity b(T\B) measures the relevance of the question B to the issue T. 

The rules of the calculus are straightforward, since they were developed earlier 
for distributive lattices in general and applied to the assertions in the form of 
probability. There is the sum rule for the relevance of a question Q to the join 
of two questions X V Y 

b(X V Y\Q) = b(X\Q ) + b(Y\Q) - b(X A Y\Q), (27) 

and its generalization 


b(X\ V X 2 V • • • V X n \Q) = 

'£b(X i \Q)-'£b(X i AX j \Q) + £ biXiAXjAXklQ)--- - , (28) 

i i<j i<j<k 

the product rule 

b{X A Y\Q) = Cb(X\Q)b(Y\X A Q), (29) 

and a Bayes’ theorem analog 

= (30) 

where the constant C in the product rule is the value of the relevance b(T\X). 
Relevances, like probabilities, need not be normalized to one. 

With the rules of the inquiry calculus in hand, and with Rota’s theorem [44, 
Theorem 1, Corollary 2, p.35], we can take relevances assigned to the join- 
irreducible elements 3(Q) (ideal questions) and compute the relevances be- 
tween all pairs of questions on the lattice. However, we need an objective 
means by which to assign these relevances for the ideal questions. 


4-7 Entropy from Consistency 


To assign relevances, we must maintain consistency with both the algebraic 
properties of the question lattice Q and the probability assignments on the 
Boolean lattice A. While, I will outline how this is done below, the detailed 
proofs will be published elsewhere. Clearly from Rota’s theorem, we need only 
determine the relevances of the ideal questions. Once those are assigned, the 
rest follow from the inquiry calculus. 

To determine the form of the relevance, I make a single assumption. That is 
the degree to which the top question T answers a join-irreducible question X 
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depends only on the probability of the assertion x from which the question X 
was generated. That is, given the ideal question X = [x 

b(X\T) = H(p(x\T)), (31) 

where H is a function to be determined. In this way, the relevance of the ideal 
question is required to be consistent with the probability assignments on A. I 
will outline how the form of the function H can then be determined completely 
from the properties of the inquiry calculus. 

The lattice structure and the inquiry calculus imposes important restrictions 
on the behavior of the relevance. Given three questions X,Y,Q € Q the rele- 
vance is additive only when X A Y = i. 

b(X\/Y\Q) = b(X\Q) + b{Y\Q), iff XAY = 1. (32) 

However, in general the result is subadditive 

b(X V Y\Q) < b{X\Q) + b(Y\Q). (33) 

This is a result of the generalized sum rule, which includes additional terms to 
avoid double-counting the overlap between the two questions. Commutativity 
of the join requires that 

b(X x V X 2 V • ■ • V X n \Q) = b(X„ (1) V X <2) V • ■ • V X„(n)\Q) (34) 

for all permutations (7r(l),7r(2) • • • , r(n)) of (1,2, • • • ,n). Thus the relevance 
must be symmetric with respect to the order of the joins. 

Last, we consider what happens when an assertion /, known to be false is 
added to the system. Associated with this assertion / is a question F = if € Q. 
Now consider the relevance b(X i V A 2 V ■ ■ • V X n V F\Q). Since / is known to be 
false, it can be identified with the absurdity _L, and the lattice A collapses from 
2 n+1 to 2 n . The associated question F is then identified with F = J._L = JL, 
where it is understood that the first i. refers to the bottom of the lattice A 
and the second refers to the bottom of the lattice Q. Since X V _L = X, we 
require, for consistency, 

b(X i V * 2 V • • • V X n V F\Q) = b{X 1 \/X 2 V • • • VX„| Q). (35) 

This requirement is called expansibility. 

I now define a partition question as a real question where its set of answers 
are neatly partitioned. More specifically 

Definition 2 (Partition Question) A partition question is a real question 
P € 01 formed from the join of a set of ideal questions P = V"=i -X* where 

vXj,x k e 3 (Q), XjAXk^i. 
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I will denote the set of partition questions by IP. There axe five partition 
questions in our earlier example: AKN, N U AK, K U AN, A U KN, and 
A U K U N, which form a lattice isomorphic to the partition lattice II3 in 
Figure lb. 

For partition questions, the relevance can be easily computed using (32) 

b(\fXi\T) = £,H(p( qi \T)). (36) 

i— 1 i== 1 

Writing this as a function K n of the n probabilities, we get 

b{\J Xj|T) = K n (pi,p2, ■ ■ ■ ,p n ))> (37) 

2=1 

where I have written Pi — pfy |T). An important result from Aczel et al. 
[2] states that if this function K n satisfies additivity (32), subadditivity (33), 
symmetry (34), and expansibility (35), then the function can be written as a 
linear combination of the Shannon and Hartley entropies 


Kn(pi j P2y ’ * * j Pn ) — & (jpi , P 2 , > Pn) “b b o-^m {Pi > P2 > * * ' > Pn ) j 

where a, b are arbitrary non-negative constants, the Shannon entropy [45] is 
defined as 

n 

H m (pi,p 2 , ■ ■ ■ ,Pn ) = -&ilog 2 Pi, (39) 

2=1 

and the Hartley entropy [21] is defined as 

oH m {jP\ i p2i * ,Pn)-l 0g 2 AT(n (40) 

where N(P ) is the number of non-zero arguments p^. An additional condition 
suggested by Aczel states that the Shannon entropy is the only solution if 
the result is to be small for small probabilities [2]. That is, that the relevance 
varies continuously as a function of the probability. For the remainder of this 
work, I will assume that this is the case. 

Given these results, it is straightforward to show that the relevance of an ideal 
question (31) can be written as 

b(X |T) - —ap(x\T) log 2 p(x|T), (41) 

which is proportional to the probability-weighted surprise. With this result 
in hand, the inquiry calculus enables us to calculate all the other relevances 
of pairs of questions in the lattice. By requiring consistency with the lattice 
structure and assuming that the relevance of an ideal question is a continuous 
function of the probability of its corresponding assertion, we have found that 
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the relevance of a partition question is equal to the Shannon entropy. Thus 
b(A V KV N\T) OC —p a log 2 Pa — Pk logo Pk — Pn logo Pn~ (42) 

where p a = p(a|T), • • * . Whereas, 

b(A V hTA r |T) OC -p a log 2 p a - PkVn log 2 PkVn, (43) 

where p fcVn = p(k V n|T). 

The inquiry calculus allows us to compute the degree to which the question 
T =‘Who stole the tarts?' is answered by the question B =‘Did or did not 
Alice steal the tarts?' by 


b(T\B) = b(T\B A T) 


= b(B\T A T) 
= KB\T) 


b(B |T) 
6(r|T) 
KBIT) 


— U 


KB|T) 
KBIT) 
KB|T) ’ 


(44) 


where C is the chosen normalization constant for the relevance. By assigning 
probabilities to the different cases, this is easily computed using the equations 
(42) and (43) above. 


The relevance of questions such as AN U KN = AN V KN is even more 
interesting, since this must be computed using the sum rule 

b(AN V KN\Q ) = b(AN\Q) + b(KN\Q) - b(AN A KN\Q), (45) 

which is equivalent to the mutual information between AN and KN 

I (AN ; KN) = H(AN) A H(KN) - H(AN, KN ), (46) 

although the information-theoretic notation obscures the conditionality of 
these measures. Thus the relevance of the common question is related to the 
mutual information, which describes what information is shared by the two 
questions. The term b(AN A KN\Q) is then identified as the join entropy. 
In the context of information theory, Cox’s choice in naming the common 
question and joint question is more clear. 


However, the inquiry calculus holds new possibilities. The relevance of ques- 
tions comprised of the join of multiple questions must then be computed using 
the generalized sum rule (28) , which is related to the sum rule via the Mobius 
function for the lattice. Combined with the Shannon entropy for relevance, 
this leads to the generalized entropy conjectured by Cox as the appropriate 
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measure [12,13]. Furthermore, one can see that this is also the co-information , 
rediscovered by Bell [4], and the lattice on which they are a valuation is pre- 
cisely the question lattice [29]. We now have a well-founded generalization of 
information theory, where the commonalities among a set of any number of 
questions can be quantified. 


5 Discussion 


There are some significant deviations in this work from Cox’s initial explo- 
rations. First, the question algebra is the free distributive algebra — not a 
Boolean algebra as Cox suggested. Cox actually first believed (correctly) that 
the algebra could not possibly be Boolean [12] [pp. 52-3], but then later as- 
sumed that complements of questions exist [13] [pp. 151-2], which led to the 
false conclusion that the algebra must be Boolean. This belief, that questions 
follow a Boolean algebra and therefore possess complements, spread to several 
early papers following Cox, including two of my own. The second major de- 
viation is that the ordering relation that I have used for question is reversed 
from the one implicitly adopted by Cox. This led to a version of the consis- 
tency relations in Cox’s work that is reversed from the consistency relations 
used in order theory, where joins and meets of questions are swapped. This 
is related to the third deviation, where I have adopted a notation for rele- 
vance that is consistent with the function z from which it derives. Thus, I use 
b(I\Q) to describe the degree to which the issue I is resolved by the question 
Q. This is in keeping with the notation for probability where p(x\t) is the 
degree to which the statement x is implied by the statement t. This is a diffi- 
cult decision, but I feel that mathematical consistency is more important than 
historical consistency — especially at the early stages of development. Never- 
theless, I would like to stress that Cox’s achievement in penetrating the realm 
of questions is remarkable — especially considering the historical focus on the 
logic of assertions and the surprising lack of attention paid to questions. One 
notable exception is a paper by Felix Klein titled ‘What is a Question?’ [29]. 

Cox’s method for deriving probability theory from consistency has both sup- 
porters and critics. It is my experience that many criticisms originate from a 
lack of understanding of how physicists use symmetries, constraints and con- 
sistency to narrow down the form of an unknown function in a problem. In 
fortunate circumstances, this approach leads to a unique solution, or at least 
a useful solution, with perhaps an arbitrary constant. In more challenging sit- 
uations, such as using symmetries to understand the behavior of the strong 
nuclear force, this approach may only exclude possibilities. Other criticisms 
seem to focus on details related to probability and plausibility. By deriv- 
ing these rules for degrees of inclusion defined on distributive lattices, I have 
taken any disagreement to a wider arena — specifically one where probability 
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and plausibility are no longer the issue. The fact that the application of this 
methodology of deriving laws from ordering relations [34] leads to probability 
theory, information theory, geometric probability [43,28], quantum mechan- 
ics [10], and perhaps to measures in new spaces [34] suggests that there are 
important insights to be gained here. 

It is remarkable that Cox’s insight led him to correctly conjecture the rela- 
tionship between relevance and probability as a generalized entropy [12,13]. 
Generalizations to information theory go back to its inception [40], and ap- 
peared recently in the form of Bell’s co-informations, which he realized were 
related to lattice structures [4]. Bell’s co-information is precisely Cox’s gener- 
alized entropy, and in this paper I have shown that the lattice on which they 
operate is precisely the question lattice. 

In retrospect, the relationship between questions and information theory is 
not surprising. For example, the design of a communication channel can be 
viewed as designing a question to be asked of a transmitter. Experimental 
design, which is also a form of question-asking, has relied on entropy [36,15,38]. 
Active learning has made experimental design an active process and entropy 
has found a role here as well alongside Bayesian inference [39,37]. Question- 
asking is also important when searching a space for an optimal solution [24,41]. 
Generalizations of information theory have found use in searching for causal 
interactions among a set of variables. Transfer entropy is designed to extend 
mutual information to address the asymmetric issue of causality [26]. Given 
two time-series, X and Y , the transfer entropy can be neatly expressed in 
terms of the relevance of the common question X M V Y t minus relevance of 
Xi V Xj +1 V Y i: where Xi =‘ What is the value of the time series X at the 
time iV Last, Robert Fry has been working to extend the inquiry calculus to 
cybernetic control [18] and neural network design [16,19]. Given the scope of 
these applications, it will be interesting to see the implications that a detailed 
understanding of the inquiry calculus will have on future developments. 

It is already known that some problems can be solved in both the space of 
assertions and the space of questions. For example, the Infomax Indepen- 
dent Component Analysis (ICA) algorithm [5] is a machine learning algorithm 
that was originally derived using information theory. However, by consider- 
ing a source separation problem in terms of the logical statements describing 
the physical situation, one can derive ICA using probability theory [30]. The 
information-theoretic derivation can be interpreted in terms of maximizing the 
relevance of the common question IV7, where X =‘ What are the recorded 
signals V and Y — L How well have we modelled the source activity V This is 
accomplished by using the prior probability distribution of the source ampli- 
tudes to encode how well the sources have been modelled [31]. 6 This notion 


6 Note that complements of questions axe used erroneously and unnecessarily in 
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of encoding answers to questions is important to inductive inquiry. 

In this paper, I have laid out the relationship between the algebra of a finite 
set of logical statements and the algebra of the corresponding questions. The 
Boolean lattice of assertions A gives rise to the free distributive lattice of 
questions Q as the ordered set of down-sets of the assertion lattice. The join- 
irreducible elements of the question lattice form a lattice that is isomorphic 
to the original assertion lattice. Thus the assertion lattice A is dual to the 
question lattice Q in the sense of Birkhoff’s representation theorem. Further- 
more, the I showed that the question lattice is isomorphic to both the lattice of 
simplicial complexes and the lattice of hypergraphs, which connects questions 
to geometric constructs. By generalizing the zeta function on each lattice, I 
have demonstrated that their algebras can be generalized to calculi, which 
effectively enable us to measure statements and questions. Probability theory 
is the calculus on A> and as such it is literally an extension of Boolean logic. 
Cox’s generalized entropies, which are called co-informations by Bell, quantify 
the relevance of a question on an issue. Traditional information theory is thus 
only a portion of the inquiry calculus, which now offers new possibilities. This 
formalism must now be extended to continuous spaces. An understanding of 
these fundamental relationships will be essential to utilizing the full power of 
these mathematical constructs. 
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APPENDIX: Deriving the rules of the calculi 


In this appendix, I derive the sum rule, product rule, and Bayes 1 theorem ana- 
log for distributive lattices. These rules are equally applicable to probability 
on the Boolean lattice of assertions A, to relevance on the free distributive 
lattice of questions Q, and any other distributive lattice. The first derivation 
of these rules was by Cox for complemented distributive lattices (Boolean lat- 
tices) [11,12]. The derivations rely on maintaining consistency between the 
proposed calculus and the properties of the algebra. In Cox’s derivations, he 
relied on consistency with complementation to obtain the sum rule, and consis- 
tency with associativity of the meet to obtain the product rule. The derivation 
of Bayes’ theorem is, in contrast, well-known since it follows directly from 
commutativity of the meet. An interesting variation of Cox’s derivation for 
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Boolean algebra relying on a single algebraic operation (NAND) was intro- 
duced by Anthony Garrett [20] . 

Below, I expound on the derivations introduced by Ariel Caticha, which rely on 
associativity and distributivity [10] . The implications of Caticha’s derivation 
are profound, since his results imply that the sum rule, the product rule, 
and Bayes’ theorem are consistent with distributive lattices in general. These 
implications are discussed in greater detail elsewhere [34]. 


Consistency with Associativity 


Consider a distributive lattice D, two join- irreducible elements, a, b € 0(2)), 
where aAb — J_, and a third element t € 2) such that a At ^ J_ and bAt _L. 
We begin by introducing a degree of inclusion (see Eqn. 6) represented by the 
function <j>, so that the degree to which a includes t is given by <p(a, t). We 
would like to be able to compute the degree to which the join a V b includes t. 
In terms of probability, this is the degree to which t implies a V b. 

Since aAb — _L, this degree of inclusion can only be a function of 4>(a, t) and 
4>(b, t ) , which can be written as 

</>(a V b,t) = S(<p(a,t),(p(b,t)). (A-l) 

The function S, will tell us how to use 4>{a , t ) and <f>(a, t) to compute f(a\/b, t). 
The hope is that the consistency constraint will be sufficient to identify the 
form of S — we will see that this is the case. 

The function S must maintain consistency with the distributive algebra T>. 
Consider another join- irreducible element c € 0(2?) where a Ac = _L, 6 Ac = J_, 
and form the element (a V b) V c. We can use associativity of the lattice to 
write this element two ways 

(aV b)V c = aV (bV c). (A-2) 

Consistency requires that each expression gives the same result when the de- 
gree of inclusion is calculated 

S(<f>(a V b, t), 4>(c, t)) = S(4>(a, f), <p(b V c, £)). (A-3) 

Applying S to the arguments <p(a V b, t ) and <j>{b V c, t) above, we get 

S{S(4>(a, t), <j>(b, t)), (j>(c, t)) = S(<f>(a,t),S(<fi(b,t),<j>(c,t))). (A-4) 

This can be further simplified by letting u = <p(a,t), v = <t>(b,t), and w = 
t ) resulting in a functional equation for the function S, which Aczel called 
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the associativity equation [1, pp. 253-273]. 


S(S(u,v),w) = S(u. S(y.w)). (A-5) 

The general solution [1], is 

S(u,v) = /(/- 1 (u) + / _1 (u)>, (A-6) 

where / is an arbitrary function. This is simplified by letting g = f~ l 

g(S(u, v)) = g(u) + g(v). (A-7) 

Writing this in terms of the original expressions we get, 

g{<f>(a V b , t )) = g(<j>(a, t)) + g(<j>(b, t)), (A-8) 

which reveals that there exists a function g : R — » R re-mapping these numbers 
to a more convenient representation. Defining z(a, t ) = g(<p(a, t}) we get 

z(aV b,t) = z(a,t) + z(b,t), (A-9) 

which is the sum rule for the join of two join-irreducible elements. 

This rule can be extended to all elements in 2) by using the Mobius function 
for the lattice, or equivalently the inclusion- exclusion relation , which avoids 
double-counting the elements in the calculation [28,42,3,34]. This leads to the 
generalized sum rule for the join of two arbitrary elements 

z(a V b,t) — z(a, t) + z(b, t ) — z(a A b, t), (A-10) 


and 


z(x i V x 2 V • • • V x n ,t) = 

^ ] z{^Xi , t j ^ ] zi^Xi A Xj , “ t~ ^ ] z[Xi A Xj A X , t) ' ( A-l 1 ) 

i i<j i<j<k 

for the join of multiple arbitrary elements xi, x 2) . . . ,x n [34]. 


Consistency with Distributivity 


Given x,y,t € T>, we would like to be able to compute the degree to which the 
meet x Ay includes t, written z(x A y,t). We can easily use (A-10) to obtain 

z(x Ay,t) — z(x, t ) + z(y , t) - z(x V y, t). (A-12) 

However, another form can be found by requiring consistency with distribu- 
tivity Dl. Following Cox [11,12], and relying on the consistency arguments 
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given by Jaynes [25], Thibus [47], and Smith and Erickson [46], this degree can 
be written two ways as a function P of two arguments 

z(xAy,t) = P(z(x i t) i z(y J x At)) = P(z(y,t),z(x,y A i)), (A-13) 

where the function P will tell us how to do the calculation. The two expressions 
on the right are a consequence of commutativity, which we will address later. 

We will focus for now on the first expression of P, and consider five elements 
a, 6, r, 5 , t € D where aAb = 1 and r A 5 = 1. By considering distributivity 
D1 of the meet over the join, we can write a A (r V s) two ways 

a A (r V s) — (a A r) V (a A s). (A-14) 

Consistency with distributivity D1 requires that their relevances calculated 
these two ways are equal. Using the sum rule (A-10) and the form of P (A- 
13), distributivity requires that 

P(z(a,£), z(r V 5, a A t)) = z(a Ar,t) + z(a A s, £), (A-15) 

which simplifies to 


P(z(a y t), z(r y a At) P z(s , a A t)) = 

P(z(a, i), z(r, a At)) + P(z(a, t ), 2 ( 5 , a A £)). (A-16) 

Defining u = z(a , i), u = z(r, a At), and w = ^( 5 , a A £), the equation above 
can be written as 

P(w, n 4 - u>) — P(u, n) + P(u, w). (A-17) 

This functional equation for P captures the essence of distributivity Dl. 

We will now show that P(n, v + w) is linear in its second argument. Defining 
k = w + v, and writing (A-17) as 


P(u, k ) = P(u } v) + P(u y w ), 


(A-18) 


we can compute the second derivative with respect to x. Using the chain rule 
we find that 

d_dkd_d_dkd_d fA 191 

dv dv dk dk dw dk dw ’ 
so that the second derivative with respect to k can be written as 


_ d_d_ 
dk 2 dv dw ’ 


(A-20) 


The second derivative of P(u, k ) with respect to k is then easily shown to be 


d 2 

dk 2 


P(u, k ) - 0, 


(A-21) 
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which implies that the function P is linear in its second argument 


P(u, v ) = A{u)v + B(u). (A-22) 

where A and B are functions to be determined. Substitution of (A-22) into 
(A-17) gives B(u) = 0. 

Now we consider (a V b) A r, which using D1 can be written as 


(a V b) A r = (a A r) V (b A r), (A-23) 

gives a similar functional equation 

P(v + w,u) — P(v, u) + P(w, u), (A-24) 

where u = z(r, t), v — z(a, r At), w = z(b, r At). Following the approach above, 
we see that P is also linear in its first argument 

P(u,v) = A(v)u. (A-25) 

Together with (A-22), the general solution is 

P(u, v ) = Cuv, (A-26) 

where C is an arbitrary constant. Thus we have the product rule 

z(x A y, t) = Cz(x, t)z(y,x A t), (A-27) 


which tells us the degree to which t includes the meet x Ay. The constant 
C acts as a normalization factor, and is necessary when these valuations are 
normalized to values other than unity. It should be noted that this only satisfies 
the distributivity of the meet over the join Dl. There are reasons why D1 is 
preferred over D2 related to the lattice product, which are discussed elsewhere 

[34]. 


Consistency with Commutativity 


Commutativity of the meet is the reason that there are 
function P, the product rule (A-13) 

two forms for the 

z{x A y, t) = Cz(x, t)z(y, x At) 

(A-28) 

and 


z(y A x,t) = Cz(y , t)z(x, y A t). 

(A-29) 

Equating the degrees (A-28) and (A-29) results in 


Cz(x, t)z(y, x At) = Cz(y , t)z(x, y A t), 

(A-30) 
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which leads to Bayes’ theorem 


z(y,x At) 


z(y , t)z(x, y At) 
z(x, t) 


(A-31) 


This demonstrates that there is a sum rule, a product rule, and a Bayes’ 
Theorem analog for bi-valuations on all distributive lattices. This realization 
clears up the mystery as to why some quantities in science act like probabilities, 
but clearly are not probabilities [34] . 
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